> [talk of being able to change objc_msgSend through a function pointer]
You have my vote for flexibility in the message sending mechanism. My only concern would be efficiency. An extra pointer dereference per method call may or may not be negligible. One cool way to possibly avoid this problem is to have the objc_msgSend function be rld'ed (run time linked) from someplace (e.g., a section of the executable image, a file specified in the environment, whatever). This way, you get to put in the function you want, and it is exactly as efficient as it used to be. There are disadvantages to this method: how do you gain access to the old objc_msgSend?
Andrew Athan
Date: Sat, 19 Sep 92 15:01:02 -0400
From: athan@object.com (Andrew C . Athan)
To: gnu-objc@prep.ai.mit.edu
Subject: Good article on Method Dispatch technique
Check out September's "Journal of Object Oriented Programming" (an ACM SIGS publication). There's an article entitled " Efficient Algorithms for Method Dispatch" that presents methods possibly applicable to the GNU Obj-C runtime.
Andrew Athan
From: Steve_Naroff@NeXT.COM (Steve Naroff)
Date: Mon, 21 Sep 92 15:45:48 -0700
To: Geoffrey S. Knauth <gsk@marble.com>
Subject: extending message lookup vs. open coding
Cc: gnu-objc@prep.ai.mit.edu
Allowing applications to conveniently override objc_msgSend() is a very useful feature that underscores the benefits of having a central dispatcher. I don't like the idea of "open coding" calls to objc_msgSend(), it is not clear that this will result in any noticeable performance improvement (it could result in significant code expansion and hurt performance, who knows).
The pre/post operation example that Kim Christensen outlined sounds good, however it only works for methods that return "id's". objc_msgSend reuses the current stack frame, rather than create its own, this complicates implementing the post operation. All of these are implementation details that can be nailed down...
If the goal is making the message dispatcher easier to customize, writing the assembler magic to reuse the stack frame (and not disrupt any registers used by the compiler) might be intimidating to many application writers. Separating "lookup" from "dispatch" removes the assembler hacking (however makes the post op functionality hard).
extern id (*objc_msgLookup(id, SEL))(id, SEL, ...);
// messaging separated into two distinct operations...
(*objc_msgLookup(obj, sel))(obj, sel, arg1);
It might be nice to override objc_msgSend on a per class basis as well, depending on the application.
snaroff.
Date: Mon, 21 Sep 92 20:05:34 -0400
From: rms@gnu.ai.mit.edu (Richard Stallman)
To: Steve_Naroff@next.com
Cc: gsk@marble.com, gnu-objc@prep.ai.mit.edu
Subject: extending message lookup vs. open coding
In our runtime, dispatch and lookup are separate. That was easy to
implement in C.
My plan is to replace objc_msgSend with a couple of array and
structure refs. So far I don't see that replacing objc_msgSend
gives a real increase in power.
From: Steve_Naroff@NeXT.COM (Steve Naroff)
Date: Tue, 22 Sep 92 11:25:18 -0700
To: rms@gnu.ai.mit.edu (Richard Stallman)
Subject: Re: extending message lookup vs. open coding
Cc: gsk@marble.com, athan@object.com
> In our runtime, dispatch and lookup are separate. That was easy to
implement in C.
This sounds good, I should probably learn more about the FSF runtime. Have you considered message forwarding (i.e. delegation)? This (unfortunately) cannot be done entirely in C. I hope the FSF runtime plans to implement forwarding, it is a powerful feature can enable some interesting things (like Distributed Objects).
> My plan is to replace objc_msgSend with a couple of array and
structure refs. So far I don't see that replacing objc_msgSend gives a real increase in power.
I'm not saying this is a panacea. In my mind, the increase in power comes down to enabling a new class of debug/analysis tools that can be written/used during development. The other is obviously changing the semantics of messaging, which is less important to me personally.
Ultimately, the decision should be made by weighing the performance win by open coding the dispatcher vs. the flexibility of being able to replace the dispatcher.
snaroff.
Date: Tue, 22 Sep 92 15:08:28 -0400
From: rms@gnu.ai.mit.edu (Richard Stallman)
To: Steve_Naroff@NeXT.COM
Cc: gsk@marble.com, athan@object.com
Subject: extending message lookup vs. open coding
I hope the FSF runtime
plans to implement forwarding, it is a powerful feature can enable
some interesting things (like Distributed Objects).
Sorry, this is beyond what I have time to think about.
I'm not saying this is a panacea. In my mind, the increase in power
comes down to enabling a new class of debug/analysis tools that can
be written/used during development.
I think this can be done anyway, by modifying the dispatch tables.
Date: Tue, 22 Sep 92 06:40:52 -0700
From: Dennis Glatting <seattle-ni-srvr!dglattin@trirex.COM>
To: uunet!next.com!Steve_Naroff@uunet.UU.NET (Steve Naroff)
Subject: extending message lookup vs. open coding
Cc: Geoffrey S. Knauth <gsk@marble.com>, gnu-objc@prep.ai.mit.edu
> Allowing applications to conveniently override
> objc_msgSend() is a very useful feature that
> underscores the benefits of having a central
> dispatcher. I don't like the idea of "open coding" calls
> to objc_msgSend(), it is not clear that this will result
> in any noticeable performance improvement (it could
> result in significant code expansion and hurt
> performance, who knows).
The GNU version of objc_msgSend returns a method address. Therefore, the compiler produces code in this fashion: stack the *method's* variables; stack objc_msgSend's variables; call objc_msgSend, clean up the stack tail; then call the method. One problem however -- although the problem may be philosophical -- is that this impacts defer-pop optimization.
The GNU objc_msgSend works by array look up. The advantage to this is speed (the array operations in the run time are nicely optimized by the compiler too). Therefore, what we are swapping (code bulk) by open coding the method dispatch is additional stacking, a subroutine call, and some stack clean up verses an array look up.
I suspect that, if we open code method dispatch, we'll approach the theoretical minimum dispatch overhead of 2 (currently it is 3).
[paragraph deleted ]
> If the goal is making the message dispatcher easier to
> customize, writing the assembler magic to reuse the
> stack frame (and not disrupt any registers used by the
> compiler) might be intimidating to many application
> writers. Separating "lookup" from "dispatch" removes
> the assembler hacking (however makes the post op
> functionality hard).
The design of the run time is purposely portable (aka: no assembly language). There are several problems with assembly language: code maintenance and dealing with various processor architectures -- not all processors are stack based.
-dpg
Date: Wed, 23 Sep 92 23:16:16 -0700
From: Dennis Glatting <seattle-ni-srvr!dglattin@uunet.UU.NET>
To: Geoffrey S. Knauth <uunet!marble.com!gsk@uunet.UU.NET>
Subject: Re: extending message lookup vs. open coding
> I found myself wondering what would happen if NeXT
> messaging had 3:1 overhead, and your system had only 2:1.
> It seems there's room for everyone to improve.
>
Steve may be right but open coding should be evaluated.
-dpg
Date: Sun, 21 Feb 1993 21:34:10 +0100
From: Kresten Krab Thorup <krab@iesd.auc.dk>
Reply-To: gnu-objc@gnu.ai.mit.edu
To: gnu-objc@gnu.ai.mit.edu
Subject: How to do lookup (or: Is the simple array approach acceptable?)
Cc: krab@iesd.auc.dk, rms@gnu.ai.mit.edu
The currently distributed runtime uses a very simple array mechanism
for method lookup. Each class has an array which is as large as the
maximum number of selectors in any class. The lookup is then simply
to get the element at the index corresponding to the selector number.
As noted by several people, this is approach is probably not good
enough if there is a lot of selectors in the system, since it will
take a lot of space. Further more, if dynamic loading is implemented
some time in the future, we will have to realloc all these dispatch
arrays when we load in new code to make room for the new selectors.
An alternative to the simple minded full size array would be the
ollowing data structure. I don't know if it already has a name ssince
i invented it myself. I will call it a `doubly indexed array'. Using
this structure we can probably save a lot of space, and time in the
initialization phase. It will however cost a bit more to do the
lookup. Is this what is known in gcc as an `obarray'? Maybe, we'll
see...
The basic idea is that an DIarray is keeping track of small intervals
(kept in buckets) of the full range of the array. If a such bucket is
empty, it will point to a special bucket, the `empty_bucket'. Hence,
an array with no elements in it will occupy only one bucket no matter
how big it is.
Besides, it is relatively cheap to realloc a such array, since we'll
only have to realloc the first indirect table. If the bucketsize is
kept relatively small, most buckets will be empty if used as dispatch
table in objc, since any class will only implement a subset of the
known selectors.
Since it is convenient, I will describe the DIarray data structure as
an objective C class. The real implementation should of cause be done
in C. The code is _not_ optimized, only coded for clarity!
@interface DIarray
{
short max_elements; /* elements in array */
short bucket_size; /* size of one bucket */
id* empty_bucket; /* the empty bucket */
id** bucket_table; /* array holding pointers to buckets */
>It sounds good. But please think of how this fits in with
>the multi-thread locking scheme that I designed.
As far as I can see, there are no serious problems concerning my new
tables in this direction.
However, I do se some problems in _general_ in your design, which have
to be thought about. What happens if it is not only a new class, but
a category which is loaded dynamically? (That's a class add-on) If
the class which gets added something has subclasses, we will have to
figure out in which order the class and the subclasses should have
their new dispatch tables enabled--since we don't have locking on
lookup, we'll have to figure out the ordering. I don't have an answer
ready at hand, perhaps someone on the list?
/Kresten
From: Steve_Naroff@NeXT.COM (Steve Naroff)
Date: Thu, 25 Feb 93 12:53:58 -0800
To: Kresten Krab Thorup <krab@iesd.auc.dk>
Subject: Re: Optimizing the GNU objc runtime [long]
Cc: gnu-objc@gnu.ai.mit.edu
Here is some data for a "typical" application on a NeXT (linked with the standard shared library of objects that we supply):
200-300 classes
6000-7000 method implementations
2000-3000 selectors (unique method names)
Many of these are classes are "private"...nevertheless, it is important that the runtime scale to support a system of this magnitude. It seems like you have already abandoned the idea of a per-class vector that has room for all possible selectors. This is good, since my "back of the envelope" calculation comes out to 3.2 megabytes to store all the caches (using 200 classes, 2000 selectors in the computation)! Let me know if I misinterpreted your proposal.
It sounds like the sparse array idea might reclaim much of this overhead. I would be interested in how much space that works out to be. The NeXT runtime typically consumes between 20-40 kilobytes for the per-class caches. The space occupied is proportional to the classes/methods used by the application. The simple rule is "pay if you play". In fact, even though there are 200-300 classes linked into an application, only 30-50% are typically used. I think any scalable system must incur mininal overhead (on a per-class basis) until the class is actually used.
Another method for doing dynamic dispatch (that is similar in spirit to your scheme) is to have a vector that is maintained on a per-app basis (not per-class) that has an entry for each selector. Each entry would contain a chain of classes that implement the selector...in practice, this chain is very short. A simple way to view this is go from the selector->class, rather than class->selector. I'm not convinced it statisfies your performance goals, but it is another way to look at the problem.
I think it is worthwhile goal to improve the performance of dispatching messages in Objective-C. Just be aware that it is a classic "time vs. space" problem...and saving time without carefully considering space doesn't work. Most computers are getting a lot faster without adding significantly more memory...this should influence the tradeoffs we make when designing the new runtime (my perspective, of course).
In the past, we have spent time optimizing memory utilization and improving locality of reference. This includes the data generated by the compiler as well as caches that are dynamically allocated (which are often "hot" for commonly used classes). We also developed a program called "objcopt" that improves launch time by building the selector hash table and "freeze drying" it in the executable for each shared library that we ship.
Given the portability goals of the project, I can't say that many of these apply to the GNU runtime...I just wanted to let you know what problems we work on wrt optimizing the NeXT Objective-C runtime.
btw...the NeXT runtime sends the "+initialize" lazily, was the change to execute before main() made intentionally?